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REMARKS 

Claims 1, 2, 13-17, 51-58, 60, and 62-80 are pending in the application. Claims 3-12, 18- 
50, 59, and 61 have been canceled without prejudice, claims 1 and 2 have been amended, and 
new claims 62-80 have been added. Claims 59 and 61 were canceled without prejudice as drawn 
to a non-elected invention ("VDDRg2" polynucleotides; Group II). Similarly, claim 2 has been 
amended to delete SEQ ID NO: 3, which is also a sequence of non-elected Group II. Support for 
the amendments and new claims can be found, e.g., in original claims 1-9 and in the specification 
at page 3, line 18, to page 4, line 21; page 5, line 26, to page 6, line 7; and Figs. 12 and 13. 
These amendments and new claims add no new matter. 

35 U.S.C. $ 102(e) 

On pages 3-4 of the Office Action, the Examiner rejected claims 1, 2, 8, 9, 13-17, 51-58, 

and 60 as allegedly anticipated by Kausch et al., U.S. Patent No. 5,508,164 ("Kausch"). 

According to the Examiner, 

Kausch et al. disclose the isolation of chromosome (column 5). The cell source 
are human cells (column 6, lines 5-15). Many chromosomes can be sorted at once 
(column 9, lines 29-43). Large amounts of pure chromosomes and DNA of the 
chromosomes is isolated (column 10, lines 22-25). Cells transfected with 
chromosomal DNA is disclosed (column 10, lines 22-25). 

Claims encompass chromosomal DNA because the claims encompass 
polynucleotide sequence comprising the polynucleotide sequence encoding a 
polypeptide with SEQ ID NO:2 including the genomic DNA of SEQ ID NO:3. 
The isolated and purified chromosomes comprise the polynucleotide sequence 
encoding a polypeptide with SEQ ID NO:2. Chromosomal DNA inherently are 
operably linked to an expression control sequences. Chromosomal DNA 
inherently comprises heterologous sequence because it undergoes recombination. 

Applicants respectfully traverse the rejection in view of the claim amendments and the 
following remarks. 
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Amended independent claim 1 is directed to an isolated nucleic acid comprising a nucleic 
acid sequence contiguously encoding a polypeptide comprising amino acid residues 39 to 1 15 or 
141 to 434 of the human pregnenolone activated receptor (hPAR) polypeptide of SEQ ID NO:2. 

The coding region of the hPAR gene extends over several non-contiguous exons that are 
interrupted by several non-coding intronic sequences (see Fig. 5 of the enclosed publication of 
Zhang et al. (2004) Genome Research 14:580-590 ("Exhibit A")). Furthermore, the DNA- 
binding domain (DBD) and the ligand-binding domain (LBD) of the hPAR polypeptide, which 
correspond respectively to amino acid residues 39 to 1 15 and 141 to 434 of SEQ ID NO:2, are 
each encoded by two or more non-contiguous exons separated by one or more introns (hPAR, 
also known as PXR 5 belongs to the "NRH" grouping depicted in the gene structure diagram of 
Fig. 5 of Exhibit A). Human chromosomal DNA, whether purified or naturally occurring, thus 
lacks a nucleic acid sequence contiguously encoding the hPAR polypeptide or the DBD or LBD 
of the hPAR polypeptide. Accordingly, Kausch's disclosure of chromosomal DNA does not 
anticipate the nucleic acid of independent claim 1 or the claims that depend therefrom. 

Amended independent claim 2 is directed to an isolated nucleic acid comprising the 
nucleotide sequence of the hPAR cDNA of SEQ ID NO: 1 . The hPAR cDNA is a DNA copy of 
the hPAR mRNA and thus lacks the introns that interrupt the coding sequence in the hPAR gene. 
Thus, the hPAR cDNA sequence (SEQ ID NO:l) recited in claim 2 constitutes an intronless 
sequence that is not present in human chromosomal DNA. Accordingly, Kausch does not 
anticipate the nucleic acid of independent claim 2 or the claims that depend therefrom. 

New independent claim 63 is directed to a recombinant nucleic acid comprising a nucleic 
acid sequence encoding a polypeptide comprising amino acid residues 39 to 1 15 or 141 to 434 of 
SEQ ID NO:2. A "recombinant" nucleic acid refers to linked sequences that do not naturally 
occur within the same molecule. Kausch nowhere discloses a recombinant nucleotide sequence 
encoding the hPAR polypeptide (or the DBD or LBD of the hPAR polypeptide) and therefore 
does not anticipate the nucleic acid of independent claim 63 or the claims that depend therefrom. 

New independent claim 73 is directed to an expression vector comprising a nucleic acid 
comprising a nucleic acid sequence encoding a polypeptide comprising amino acid residues 39 to 
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1 15 or 141 to 434 of SEQ ID NO:2. Nowhere does Kausch disclose an expression vector 
encoding a polypeptide containing the DBD or LBD of hPAR as is required by claim 73. 
Accordingly, Kausch does not anticipate the expression vector of independent claim 73 or the 
claims that depend therefrom. 

In light of these comments and claim amendments, applicants request that the Examiner 
withdraw the rejection. 



Applicants submit that all grounds for rejection have been overcome, and that all claims 
are now in condition for allowance, which action is requested. 

Enclosed is a Petition for Three Month Extension of Time and checks for the extension of 
time fee and excess claims fee. Please apply any other charges or credits to deposit 
account 06-1050, referencing Attorney Docket No. 17808-002001. 



Fish & Richardson P.C. 
45 Rockefeller Plaza, Suite 2800 
New York, New York 10111 
Telephone: (212)765-5070 
Facsimile: (212) 258-2291 
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Genomic Analysis of the Nuclear Receptor Family: 
New Insights Into Structure, Regulation, 
and Evolution From the Rat Genome 

Zhengdong Zhang, 1 Paula E. Burch, 1 Austin J. Cooney, 2 Rainer B. Lanz, 2 
Fred A. Pereira, 2 ' 3 Jiaqian Wu, 1 Richard A. Gibbs, 1 George Weinstock, 1 ' 4 and 
David A. Wheeler 1 - 5 

1 Human Genome Sequencing Center, Department of Molecular and Human Genetics, 2 Department of Molecular and Cellular 
Biology, 3 Huffington Center on Aging, Department of Otolaryngology, Baylor College of Medicine, Houston, Texas 77030, USA; 
4 Department of Microbiology and Molecular Genetics, University of Texas Medical School, Houston, Texas 77225, USA 

Completion of the Rattus norvegicus genome sequence enabled a global inventory and analysis of the nuclear receptors 
(NRs) in three mammalian species. Forty-nine NR members were found in mouse, 48 in human. Forty-seven were 
found in the rat, with gaps at the locations expected for the other two. Pairwise comparisons of their distribution in 
rat, mouse, and human identified II syntenic NR gene blocks, including three small clusters of two or three closely 
related genes, each spanning 40 kb to 1700 kb. The exon structure of the ligand-binding domain suggests that exon 
shuffling has played a role in the evolution of this family. An invariant splice junction in all members of the NR 
family except LXR$ suggests a functional role for the intron. The ligand-binding domains of PXR and CAR are 
among the most divergent in the family. Their higher nucleotide substitution rates may be related to the central role 
played by these two NRs in the metabolism of the foreign compounds and may have resulted from limited positive 
selection. 

[Supplemental material is available online at www.genome.org.] 



Nuclear receptors (NRs) are transcription factors capable of ex- 
erting regulation of gene expression in the nucleus in response to 
various extracellular and intracellular signals (Tsai and O'Malley 
1994; Mangelsdorf et al. 1995). They are activated by binding of 
small hydrophobic compounds, such as steroids, retinoids, and 
thyroid hormones. Ligand binding triggers a conformational 
change in the receptor proteins, which enables an interaction 
with cofactors and specific c/s-regulatory DNA sequences called 
hormone response elements (HREs) to subsequently modify gene 
expression. Cognate ligands are not identified for all nuclear re- 
ceptors. Those that currently lack identified ligand molecules are 
termed "orphan" NRs (Giguere 1999). Because NRs bind small 
molecules which can be easily modified by drug design, and regu- 
late a group of diverse and crucial biological functions such as 
metabolism, homeostasis, development, and disease, they have 
become promising pharmacological targets. 

NRs share a similar modular domain structure, which in- 
cludes, from N-terminus to C-terminus, the variable modulatory 
A/B domain, the DNA-binding domain (DBD), the hinge D- 
region, the ligand-binding domain (LBD), and an F-domain that 
is not found in all NRs. The DBD contains two zinc fingers in 
tandem that encompass -80 amino acid residues in total and are 
directly involved in recognition of the cognate HRE. The LBD 
harbors a hydrophobic ligand-binding pocket, deep within its 
core, that is specific to and thus variable among different recep- 
tors. The DBD and LBD are the two most conserved domains of 
NRs and, as a result, are regarded as dual signatures of this protein 
family. 

Corresponding author. 

E-MAIL wheeler@bcm.tmc.edu; FAX (713)798-6977. 

Article and publication are at http://www.genome.org/cgi/doi/10.1101/ 
gr.21 60004. 



NRs constitute one of the largest groups of transcription 
factors in animals. Twenty-one NR genes are identified in the 
complete sequence of the Drosophila melanogaster genome (Adams 
et al. 2000), and over 270 are found in Caenorhabditis elegans 
(Sluder and Maina 2001). The latest estimate of the number in 
the human genome sequence, based on sequence alignment and 
phylogenetic analysis, is 49 NR genes and three NR pseudogenes 
(Robinson-Rechavi et al. 2001). In a detailed study of the evolu- 
tionary relationship among NRs (Laudet 1997), the majority of 
them were assigned to six well defined subfamilies whose inter- 
relationships remain unresolved. As a result of the work of Lau- 
det, a systematic naming convention was proposed (Nuclear Re- 
ceptors Committee 1999) including the creation of a new sub- 
family 0, which consists of the nuclear receptors lacking either 
the DBD or the LBD. 

With the draft rat genome sequence available (Rat Genome 
Sequencing Project Consortium 2004), it is now possible to con- 
duct a three-way study of the NR genes comparing the human, 
mouse; and rat. To gain new insights into the structure, regula- 
tion, and evolution of this fascinating family we sought to de- 
termine their genomic location and their gene structure, and 
re-evaluate their phylogenetic relationships in Homo sapiens and 
the two most medically important model systems, 

RESULTS AND DISCUSSION 

Nuclear Receptor Inventory in Rat, Mouse, 
and Human Genomes 

The presence of six NR domains was examined in the rat, mouse, 
and human genomic sequences using GENEWISEDB (see Meth- 
ods). The numbers of NR domains identified in the three ge- 
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| Table 1. Numbers of Sequences Encoding Nuclear Receptors 
I Found in Rat, Mouse, and Human Genomes" 



1 Sequence 



Rat 



Mouse 



Human 



NR genes 

NR genes, aberrant' 
NR pseudogenes 
Domain singletons** 

DBD 

LBD 

SMD e 



47 b 
0 
3 

1 
0 
0 



49 
1 
4 

1 
1 
0 



48 
1 
3 

0 
0 
1 



a Genome versions: human, April 2003; mouse, February 2003; rat, 
April 2003. 

b AII complete and partial genes are included in the tally (see text). Not 
tallied are NR1D2 and NR2E3, whose complete absence correlates 
with sequence gaps in the rat genome assembly at the location pre- 
dicted by synteny with the mouse and human orthologs. 
One aberrant NR gene structure was identified in each of the human 
and the mouse genomes. . . : ,v , _ ' 

d Domain singletons not including DAX-1 and SHP, which are subfam- 
ily 0 NR genes. '.. 
e Steroid modulatory A/B domain. One CCR domain singleton was 
found in the human genome. 



nomes are summarized in Table 1 (see Supplemental Table 1 for 
a detailed inventory and genomic coordinates). Grouping 
and subsequent assignment of these domains to different NRs 
by BLAST revealed that most of the known mammalian NR 
genes are present in the current three genome sequences (Suppl. 
Table 1, Fig. 1); however, the sequences encoding several recep- 
tors are partially or completely missing in the rat and mouse 
genomes. The absence of the sequences encoding Rev-erb(3 
(NR1D2) and PNR (NR2E3) and the LBD of TLX (NR2E1) in 
the rat genome and the DBD of LXRp (NR1H2) in the mouse 
genome can be explained by gaps in these two assemblies at the 
expected syntenic locations. The final tally of complete or par- 
tially identified NR genes was 48 for human, 49 for mouse, and 
47 for rat. 

Among the NR genes are also "domain singletons," the ge- 
nomic sequences encoding NR domains without nearby se- 
quences, or gaps, to make complete NR genes (Suppl. Table 1). 
They do not share sequence similarity with the single-domain 
(NR0B1) and SHP (NR0B2), two NRs known to lack a DBD. 

Although some domain singletons might be a result of false 
positive identifications, others defy so quick a dismissal and re- 
main puzzling. For example, a 522-bp sequence identified on 
human chromosome 16 encodes a partial A/B domain of the 
glucocorticoid receptor (GR, NR3C1), and is 95% identical to a 
portion of the first coding exon of GR. The observation that the 
intron downstream of the first coding exon of GR harbors a po- 
tentially active family- Y Alu element (Batzer et al. 1990) and that 
the 522-bp partial copy is immediately surrounded by two Alu 
elements from the families Y and C, suggested that the creation 
of this GR domain singleton may be related to the retrotranspo- 
sition activity of the nearby Alu-Y element. 

NR pseudogenes (^) were identified in each species (Table 2). 
Our results confirmed the existence of the three known pseudo- 
genes in the human genome including VFXRfi, the only unproc- 
essed pseudogene (Maglich et al. 2001; Robinson-Rechavi et al. 
2001). Because the mouse and rat orthologs of tyFXRfi are ex- 
pressed (identified and experimentally proven to be active by one 
of the authors, J.W., and Otte et al. 2003) and because FXRfJ may 
share some functions with FXR in cholesterol metabolism, it re- 
mains unclear under what circumstances FXR$ was silenced and 
how its loss was tolerated and fixed in the ancestral primate 
population. 



Four pseudogenes were detected in the mouse genome and 
three in the rat genome. Although there are two LRH1 pseudo- 
genes in both the mouse and rat genomes, it is likely that the two 
sets were created independently because there are no syntenic 
pairings, and they have marked differences in their sequence 
features (data not shown). 



Genomic Distribution of Nuclear Receptors 

The genomic locations of NRs were mapped onto the rat karyo- 
gram (Fig. 1). NR genes were distributed throughout the rat ge- 
nome except for chromosomes 9, 12, 14, and 17. Although rat 
chromosome Y was unavailable, no NR genes are found on the 
human Y chromosome, and none were expected there for rat or 
mouse. The Poisson test rejected the random distribution 
(P< 0.001) of NRs in the rat genome. We identified 11 syntenic 
blocks common to all three genomes; that is, in each block, the 
same set of NR genes locate on a single chromosome in all three 
genomes (Table 3). The sizes of these 1 1 blocks vary from 0.21 Mb 
to 54.33 Mb. Except for the blocks I, II, and IV, all syntenic blocks 
have similar sizes in all three genomes. RORy (NR1F3) and FXR$ 
(NR1H5) in block IV are less than 9 Mb apart in the rodent ge- 
nomes; however, they are separated by a 34-Mb interval that 
includes the centromere in human. 

Three tightly linked NR gene clusters stand out within the 
syntenic blocks: cluster i composed of TRa (NR1A1), RARa 
(NR1B1), and Rev-erba (NR1D1) from block VII; cluster ii of TR$ 
(NR1A2), Rev-erb^ (NR1D2), and RAR$ (NR1B2) from block VIII; 
and cluster in of SF1 (NR5A1) and GCNF1 (NR6A1), a subset of 
block X. They span 270 kb, 1700 kb, and 40 kb, respectively, in 
the rat genome. Salient features of clusters / and ii in the human 
and rat genomes were described previously (Laudet et al. 1992; 
Koh and Moore 1999). They are composed of closely related 
paralogous triplets that must have arisen by duplication of an 
ancestral TR, Rev-erb, and RAR gene cluster. The most remarkable 
feature of cluster /, the overlap of the 3'-most exons of one vari- 
ant of TRa with Rev-erba. (Lazar et al, 1989), has not been ob- 
served in the chicken (Forrest et al. 1990; Bonnelye et al. 1994). 
In cluster ii TR$ and Rev-erb^ do not share terminal exons (Koh 
and Moore 1999). 

The genome sequences bring details of this organization 
into focus. The gene order, spacing, and orientations are different 
in the extant clusters i and ii (Fig. 2). Although TR and Rev-erb 
maintain the same tail-to-tail orientation, the pair is inverted 
relative to RAR in the two clusters. Among these six genes, only 
TRa has two splice variants with downstream extended 3' coding 
exons, that is, the ones overlapping Rev-erba (see inset, Fig. 2A). 
This would suggest that the TR$ gene structure reflects the an- 
cestral state of TR and therefore recruitment of the terminal exon 
occurred as a result of the juxtaposition of the two NR genes. It 
will be interesting to determine whether this is a mammalian 
invention, as suggested by the negative findings in chicken, or is 
a general feature of vertebrates. 

Given the propensity for processes of chromosomal rear- 
rangement to scatter the majority of the NR genes, it is interest- 
ing that both clusters remained closely linked, suggesting that 
natural selection favors the clusters. All other syntenic groups of 
NR genes found here belong to a set of large syntenic blocks 
shared by the rat, mouse, and human genomes and may simply 
reflect the current state of the chromosomal organization on the 
whole-genome scale. Studies of the segmental duplication sug- 
gest that the recent segmental duplication events have contrib- 
uted little to the evolution of the NRs in human, mouse, and rat, 
as no NR genes or their functional domains are found in the large 
duplicated regions in the human and rat genomes (Bailey et al. 
2002; Tuzun et al. 2004). 
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Figure 1 The chromosomal landscape of rat nuclear receptor genes. NR genes on the forward strand were placed on the right of the chromosomes, 
and NR genes on the reverse strand were placed on the left NR1D2, NR2E3, and the sequences encoding the LBDs of NR1B2 and NR2E1 are missing 
due to sequence gaps in the current rat genome assembly. Their genomic locations are indicated in the square brackets (L for LBD). The syntenic blocks 
containing NR genes are highlighted in green (see also Table 3). 



Phylogenetic Analysis 

The NR DBDs and LBDs were tested separately to reinvestigate 
the possibility that recombination between the two domains ac- 
counted for some of the diversity in the NR family (see Laudet et 
al. 1992), and to enable investigation of the relationships with 
the NROB group. The overall topologies of the two trees gave the 
expected subfamily and group clades, upon which their system- 
atic nomenclature is based (Laudet 1997). They differed in small 



details in a way that could possibly be consistent with one or 
more exchange events between these two domains early in NR 
history. For example, subfamily NR4 was closer to NR1 in the 
LBD tree (Fig. 3A) but was closer to NR5 in the DBD tree (Fig. 3C). 
However, 68% bootstrap support was marginal for the DBD con- 
figuration. This issue warrants further investigation. 

SHP (small heterodimer partner) and DAX-1 (dosage- 
sensitive sex and AHC critical region on the X, gene 1) of the 
NROB group were thought to possibly represent an ancient gene 
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Table 2. Human and Rodent Nuclear Receptor Pseudogenes 











Truncation* 


Genome ' 


..; Pseudogene 


Location 


; v ,. : Type • 





Human 
Mouse 

■Rat I ■ 



tyFXRfr 

tyERRa^ 
tyRev-erb$ 
tyPNR, V. 
tyLRHI / 
MRHl/^ 
tyERRfi 
tyLRHl ;■/ ■ 
tyLRHl ; 



chrT+:,1 14480335 
chr13-:55510764 
chrl 3 -:1 9064728 
chr19+:39472488 
chrl 5+:35760537 
chr3+:1 45441 245 
chr6-:1 19298331 
chrX+:1 46791 239 
chrT1+:48636215 
chrX-:31 030327 



unprocessed 
processed 
processed 
semiprocessed 
processed - 
semiprocessed 
processed ? v 
processed 
processed , 
processed \ 



no 
% yes 
no 
no 
yes 
yes 
yes 
no 
yes 
:yes 



no 

. yes 

r\oj;. 

.C'no;; 
no 
no 
no^ 
no^ 

i ;;no?' 
no 



truncation - is relative ; to the coding sequences? 



J.L, 



sill 



structure (cf. Guo et al. 1996) because of their high degree of 
divergence from other NRs. Our results place the NROB group 
most closely to NR2C (TR2 and TR4) with strong bootstrap sup- 
port for this configuration suggesting that they arose, by the loss 
of the DBD, during or after the duplications that expanded the 
NR2 subfamily. They subsequently evolved much more rapidly 
than the other NR2 members, as indicated by long branch 
lengths after divergence from NR2C, freed from functional con- 
straints presumably imposed by the DNA binding requirement. 
They now act as modulators of other NRs through a variety of 
protein-protein interactions (e.g., Johansson et al. 2000; Zhang 
and Chiang 2001; Gurates et al. 2003). 

The LBDs of most NR members have changed little since the 
divergence of humans and rodents. This is manifested in the tree 
as extremely short terminal branch lengths, that is, those 
branches representing the last common ancestor of the three 
species. However, three groups, (NR1I2-3, NR1H5, and NR0B1-2, 
see Fig. 3A, shaded groups) were significantly more divergent 
among the three species. Nucleotide substitution analysis re- 
vealed that the synonymous rates in the LBDs of CAR (NR113) 
and PXR (NR1I2) are average for the family, whereas the nonsyn- 
onymous rates were 6.4 and 3.7 times higher than the average 
(Suppl. Fig. 1). 

Evaluation of the terminal branch lengths of all NR mem- 
bers revealed cases where the rat sequence was closer to human 



than the mouse was: RARa (NR1B1), GR (NR3C1), and LXRa 
(NR1H3), For many others members, the human, rat, and mouse 
were virtually indistinguishable. These observations may be of prac- 
tical value in choosing model systems for pharmacological studies. 

There was too little variation in the -80-aa DBD to form a 
well resolved tree, so DNA sequences were used for this domain. 
The terminal branches were again of most interest to the inter- 
species comparison. Long terminal branches were observed for 
all but two NR members: COUP-TFI (NR2F1) and COUP-TFII 
(NR2F2; Fig. 3B, shaded portion). The relative absence of inter- 
species variation in the DNA encoding these two NR domains 
suggested the possibility of selection operating on the DNA se- 
quence itself. Conserved regulatory sequences could be one ex- 
planation for this observation. It may therefore be significant that 
the DBD is uninterrupted by introns in these two NRs (see below). 

The K A /K S ratios of the LBD domains indicate that the NRs 
are subject to strong purifying selection. No positive selection 
was detected by Student's t-test. However, because the K A /K S ra- 
tios of the LBDs of PXR and CAR were 4,0 and 5.6 times greater 
than the averages, respectively, these two domains may have 
experienced limited positive selection in the context of the NR 
evolution. For PXR and CAR, the increased iCyfCj ratios in the 
LBDs could be more readily explained by their biological func- 
tions. PXR, an orphan NR preferentially expressed in the liver 
and intestine, responds to potentially harmful chemicals by ac- 



Table 3;ipSyntenk Blocks Containing NR Genes in Rat, Mouse, and Human Genomes 




II 
^ III 

IV 

V ; 

VI 
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: x 
Jxi> ' 
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rKmi:F3- 

^NR1H4, 
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NR2B1, 
NRTI3: 



NR2C2l;||;t : | : 
NR2E3>-#r:|: : :;: 
NRTHS^^t:.- 
NR2C1 f ? ^ 
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v..^p;. 

2X2 :> 
TOtOV 
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5.1 

11-8 
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M' >>tTchrt%: 
■■^"■■^ chr12 
chrl 1 .h 
\ : chrl^X 
v : chrl 5 X; 
■vv- •• chri ; 
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'55:4; 



^The size of the rat span was estimated from the location of the gap situated at the expected location of NR2E3 based; on synte^nic flanking mouse j 

gene Pkm2 and NR1F1. , -\ ' ■:; \ \ ' \ ; - ^ \ 

%he size of the rat span was estimated from the location of the gap situated at the expected location of NR1D2 based on syntenic flanking mouse \ 

gene RpllS and NR1 A2 (see Fig. 2 legend). ^ ff^ - ! 
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Figure 2 Chromosomal location of related NR gene clusters / and //. Genes are labeled on each figure, and closely related paralogs are similarly shaded. 
Coordinate positions for each chromosome are indicated below the number lines. Gene arrow lengths are proportional to the size of each gene. (A) 
Cluster / spans -270 kb. The inset gives a scale drawing of the relationship of the NR1 AT and 1 D1 genes. The three known variants of NR1 Al are shown. 
Coding exons are shaded boxes, 3' UTRs are open. A filled inverted triangle marks the splice acceptor of the invariant LBD splice junction (see also Fig. 
3B). (B) Cluster //, ~1 .4 Mb. The rat gene for NR1D2 is only presumed to exist at the indicated position. Sequences for this gene are absent from the 
assembly, and a gap exists at this position (indicated by the broken line). The rat NR1 B2 is a partial gene, containing a DBD but not an LBD, most likely 
as a result of incomplete assembly of this draft genome. Note that the 1 A and 1 D genes are on opposite strands in each cluster, in the same orientation 
relative to each other; their order changes relative to the 1 B gene. 



tivating the expression of cytochrome P-450 genes crucial for the 
detoxification of a wide variety of structurally diverse xenobiotics 
and endobiotics (Kliewer et al. 1998; Lehmann et al, 1998). The 
KjJKs ratios of the remaining orphans were much more conserved 
and thus their ligands, if any, are not likely to be species-specific. 

We investigated the structural implications of the LBD se- 
quence variation in the PXR group. Thirty-three variable sites in 
the multiple sequence alignment of the LBDs of PXR from hu- 
man, mouse, rat, rhesus, pig, rabbit, dog, chicken, and zebrafish 
were mapped on the tertiary structure of the LBD of the human 
PXR (Watkins et al. 2003). Seven sites line the inner surface of the 
ligand-binding pocket (Watkins et al. 2001); eight variable sites 
were distributed along a-helix 9 (a9), which is involved in pro- 
tein-protein interactions; the remaining sites were distributed 
uniformly throughout the LBD (Fig. 4). The set lining the ligand- 
binding pocket was in a position that could possibly form direct 
contacts with the bound ligand and may therefore contribute to 
the difference between the ligand-binding properties of the hu- 
man and rodent PXRs. The set distributed along a9 was out- 
wardly oriented. 

By contrast, the longest a-helix, alO, has only four variable 
sites, all extending toward the interior protein. The tertiary struc- 
ture of the PPAR-RXR heterodimer (Gampe Jr. et al. 2000) reveals 
that the outer surface of a 10 is involved in the interaction with 
RXR. alO probably functions similarly in other heterodimeric 
partners of the RXR, including PXR. Thus variation of the out- 
ward face of PXR a 10 may be constrained by this important func- 
tion. 



Exon Structures of DBD and LBD 

Information on the splice junctions, derived from BLAT align- 
ments of amino acid sequences of NR mRNAs to the genome, was 
used to further characterize the family. All splice junctions were 
conserved among orthologs of the various NR family members. 
When comparing paralogous members, informative patterns of 
conservation emerge within these two domains (Fig. 5). 

Eight patterns are evident in the DBD splice junctions (Fig. 
5A). The junction is located at various positions in between the 
two zinc finger motifs in four of the eight groups. It is located in 
the first zinc finger motif in the NR2B1-3, NR2C1&2 group, and 
it is located at different positions within the second zinc finger in 
NR2A1&2, NR2F6, and NR2E1 groups. 

The splice junction was lost from NR1H2&3, NR2F1&2, and 
NR6A1. Because these do not form a monophyletic group in the 
tree (Fig. 3A), the intron was probably lost in three separate 
events. Members of subfamilies NR1 and NR3 show little varia- 
tion in junction location, whereas subfamily NR2 has several 
variants. In two cases members of different subfamilies shared 
the same splice junction: NR1 and NR5, and NR1I and NR4. This 
result, taken together with the phylogenetic results described 
above, may suggest a complex evolutionary relationship between 
the subfamilies NR1, 4, and 5 (see Phylogenetic Analysis above). 
Alternatively, there could be preferred sites for acquiring introns. 
Elucidation of the principles governing the dynamics of intron 
acquisition and change over long evolutionary timescales is 
needed to understand these relationships. 
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Figure 4 Variable sites in the LBD of PXR. (4) The variable sites, highlighted in gray, in the protein sequence alignment of the LBDs of the human, 
mouse, and rat PXRs. The corresponding secondary structure is indicated below the sequence alignment: the a-helix is represented by the cylinder, and 
the (J-sheet by the parallelogram. The seven variable sites at which the amino acid residues line the ligand-binding pocket and the ones found in the 
a-helix 9 and its vicinity are boxed with the solid line and broken line, respectively. Other available LBD sequences of the rhesus, pig, rabbit, dog, 
chicken, and zebrafish PXRs are omitted from the sequence alignment presented here, because the inclusion of them does not introduce changes to 
the general variation pattern. (B) The same sites, highlighted in yellow, in the tertiary structure of the LBD of the human PXR (the blue solid ribbon). 
The agonist is shown as the small red molecular structure bound in the receptor's ligand-binding pocket, and the coactivator in fragment is shown as 
the green solid ribbon. Details of the variable sites, with the side chains of the amino acid residues at these sites shown in yellow, in the ligand-binding 
pocket (Q and in the a-helix 9 and its vicinity (D) are shown. 



The LBD was less conserved, overall, than the DBD com- 
pared across the whole family (alignments in Suppl. Fig. 2). 
Within it, three sequence motifs were identified (see Methods), 
although none of those were as conserved as the zinc finger mo- 
tifs in the DBD. Motif I, spanning a-helices 3 and 4, was previ- 
ously described (see Wang et al. 1989 and Wurtz et al. 1996 for 
definition of helices in LBD). Motif II, a-helices 7-9, was de- 
scribed in part (Wang et al. 1989 identified conservation in he- 
lices 7 and 8). Steroid receptor groups NR3A and NR3C differed 
from all others in that motif II was not detected. However, se- 
quence similarity to the left half of the consensus pattern (Fig. 
5C) was easily observable (note crosshatching of NR3A1-3 and 
NR3C1-4 in Fig. 5B, and see Suppl. Fig. 2; see also Wang et al. 
1989). Motif III spans most of a-helices 10 and 1 1 . Motifs I and II 
are indicated schematically in Figure SB (motif III is altered or 
deleted in some NR isoforms, so we have omitted it from the 
figure pending further analysis of all of the individual splice vari- 
ants). 

Up to four splice junctions were found in the peptide se- 
quences of the region of the LBDs to which our analysis was 
confined (see the Pfam profiles used to identify LBD, and Meth- 
ods). The locations of the four introns were confined to distinct 
regions of the LBD as defined by the aforementioned structural 
motifs. The first is within motif I; second, between motifs I and 
II; third, within motif II; and fourth, downstream from motif II. 
Introns were lost multiple times at regions 1, 2, and 4. Moreover, 



the precise location of introns 1, 2, and 4 was variable. In distinct 
contrast, intron 3 was invariant in that it was present in all of the 
NRs except LXR$ (NR1H2), and it was always a phase-1 intron at 
the same amino acid position in motif II. Although conserved in 
position and phase, this intron varied in size from 123 bp in 
mouse NR2A2 to 53,000 bp in human NR5A2. Except in TLX and 
the steroid hormone receptor genes, on the splice acceptor side of 
the intron, there was a highly conserved aspartic acid (occasion- 
ally replaced by glutamic acid) which contributes to the polar 
interactions involved in the NR dimerization of those members 
in which it is present. 

The conserved LBD splice junction was likely to have origi- 
nated early in the family and was subsequently conserved in 
evolution: it was also observed in the LBD of the Danio rerio 
SVP46 (NR2F5, data not shown). The selective pressure maintain- 
ing the splice junction could arise from conservation of amino 
acid sequence. The aforementioned aspartic acid codon is split by 
this phase-1 splice junction. However, motif II is much less con- 
served than the zinc finger motifs or motif I, and some NR sub- 
families have neither the aspartic acid nor a glutamic acid at the 
splice junction. Thus, some sequence or structural motif in the 
NR mRNA involved in its regulation, processing, or stability may 
be the determinant of the conservation of this splice junction. 
Because LXR$ (NR1H2)— as the single exception — lost this splice 
junction, the comparison of its expression to other NR genes may 
shed light on this phenomenon. 



Figure 3 Unrooted phylogenetic trees of the NR family. The same color scheme for NR subfamilies is used as in Fig. 1 . Croup-level designations (e.g., 
OB, 1A, 1B, 6A) label the interior branches, but common gene names label the terminal branches. Bootstrap values expressed in percentage are 
indicated at the nodes (branch bifurcations). (A) A complete tree constructed from the multiple sequence alignment of the LBDs of all NRs found in rat, 
mouse, and human. Shading highlights groups exhibiting rapid evolution. (B) NR2 subfamily clade (orphan receptors) taken from the DBD tree. Shading 
highlights a group exhibiting increased conservation. (Q Portion of DBD tree showing the relationship between subfamilies NR4 and NRS. 
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Figure 5 The gene structure encoding the DBD and LBD domains of the NR genes. Open bars are exons, drawn to scale; line segments, drawn at fixed 
length, give intron locations. (A) DBD splice junctions. Sequences are 75-78 aa in length. The shaded boxes indicate the location of the two C4 zinc 
finger motifs within this highly conserved domain. Introns may be found at seven different locations in the DBD across the entire family, or may be 
absent. Vertical hash marks indicate the location of junctions that were shared in the following groupings: o, NR2B, NR2C1,2; b, 
NR1A,1 B,1C,1D,1F,1H4-5, NR5A; c, NR1I, NR4A; d, NR3A,3B,3C; e, NR2E3; f, NR2A, NR2F6; g, NR2E1; and not shown are group h, NR1H, NR2F, and 
NR6A, which have no intron in the DBD. (fi) LBD splice junctions. Sequences are 1 70 (NR1 D2) to 208 (NR0B1 ) aa in length. Each row is a schematic 
drawing giving the relative location of the splice junction and the group of NRs sharing the splice junction pattern. The position of splice junctions in 
orthologs was always the same, and thus species designations are omitted. Two conserved motifs (I and II, see text) in the LBDs are shown as the hatched 
areas. The location of a highly conserved negatively charged amino acid residue (aspartic acid or glutamic acid) in motif II is marked by an inverted 
triangle. The four regions within which introns were found are indicated by slash marks: "\" in motif 1, "I" intermotif region, no slash in motif II, and 
7" after motif II (see text). (Q The consensus sequences of motifs I and II. The secondary structure of the corresponding part of the LBD, derived from 
crystal log raphic studies, is indicated below the sequence. Letters in bold correspond to the residues of the NR signature, involved in stabilizing the 
canonical fold of the NR LBDs (see Wurtz et al. 1996). 
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Comparison of the 26 different splicing patterns of the se- 
quence encoding the LBD (Fig. 5B) conveys the sense that large- 
scale sequence changes, intron loss or gain, and exon addition or 
substitution played an important role in shaping the evolution of 
this family. The loss or gain of the first, second, or fourth introns 
in the LBD occurred within many NRs. Large-scale innovative 
changes in the coding sequence of the LBD region may have 
contributed significantly to the rise of some new NR genes. FXR$ 
(NR1H5) appears to have added one exon between the conserved 
motifs I and II. In the steroid receptors (ER of the group NR3A; 
GR, MR, PR, and AR of the group NR3C), the helix 9 homology of 
motif II abruptly disappears beginning with the loss of the con- 
served aspartic acid on the splice acceptor side of the exon (see 
above). Although the downstream C-terminal boundary of the 
loss of homology is difficult to determine, this observation could 
be neatly explained by substitution of the exon downstream 
from the conserved third splice junction, an event leading to 
specificity for steroid ligands in groups NR3A and 3C. 

Further variation in LBD splice junction patterns may exist 
in other isoforms, and thus a full accounting of all isoforms, in 
these and other species, will be important. 

Conclusion 

The genomic comparison of the NR families from three related 
mammals affords new insight and raises new questions about the 
structure, function, and evolution of this important family of 
transcription factors. Despite the high degree of conservation 
among the NR sequences, there was clearly distinguishable spe- 
cies-specific variation in three groups. Among them, PXR and 
CAR, in group NR1I, share some ligands (Moore et al. 2000) and 
regulate overlapping but distinct sets of genes involved in xeno- 
biotic detoxification (Maglich et at. 2002). Given the central role 
of CAR and PXR in the xenobiotic metabolism and ingestion, 
these two NRs may have evolved faster in response to different 
sets of environmental challenges encountered by humans, mice, 
and rats. The nature of the species-specific adaptations repre- 
sented by the other two rapidly evolving groups, NR1H5 and 
NR0B1-2, awaits improved understanding of their functional 
roles, 

Paralogous NR family members exhibit a variety of different 
exon structures in both their DBDs and LBDs. Among the varia- 
tion, the conserved location of the splice junction in the second 
motif of the LBD stands out as a peculiar phenomenon. It may 
prove to be a more reliable signature for the NR genes than the 
C4 zinc finger. Very similar findings are reported in other gene 
families, for example, chemoreceptor superfamily (Robertson et 
al. 2003) and DEAD helicase genes (Boudet et al. 2001), An un- 
derstanding of the selective constraints that preserve such an- 
cient introns may lead to new understanding of protein or mRNA 
structure and processing. 

METHODS 

Identification of Nuclear Receptor Genes in Human, 
Mouse, and Rat Genomes 

Six structural and functional domains specific for members of the 
NR family were obtained from Pfam (Bateman et al. 2002). They 
are the ligand-binding domain (Pfam database entry name: hor- 
mone_rec), found in all members of the family, the C4-type zinc 
finger DNA-binding domain (zf-C4), found in all but two mem- 
bers, and the four modulator A/B domains, each specific for a 
given steroid receptor: androgen receptor (Androgen_recep), glu- 
cocorticoid receptor (GCR), estrogen receptor (Oest_recep), and 
progesterone receptor (Prog_receptor). The DBD sequence corre- 
sponded to a 75-78-residue segment, starting at the location two 
amino acid residues before the first conserved cysteine, and en- 



compassing both C4 zinc fingers. The LBD began at the twelfth 
residue of a-helix 3 and extended through a-helix 10 (Wurtz et 
al. 1996; Greschik et al. 1999). 

The mRNA and protein sequences of 62 representative NRs 
(Robinson-Rechavi et al. 2001) were downloaded from GenBank. 
If the human gene sequence of an NR was available, the mouse 
and rat gene sequences of the same NR were also retrieved, if 
available. The Pfam domains present in these 62 NRs were iden- 
tified using HMMPFAM (HMMER 2.3.1; Eddy 1998). Because the 
E-values of the identification of the NR domains are 10 20 - 10 50 
times less than those of other domains identified, their identifi- 
cation and presence in NRs were unambiguous. 

The human, mouse, and rat genomic sequences used in this 
study were human genome build 34 of June 2003, mouse genome 
build of February 2003, and rat genome build of April 2003. To 
take advantage of parallel computing, each of these three ge- 
nomes was partitioned into 750-kb segments with 2-kb overlaps. 
Only domains of the NRs identified at the previous step with 
stringently high E-values were searched in the genomic se- 
quences using GENEWISEDB (Wise 2.2.0; Birney and Durbin 
2000). Usually GENEWISEDB can predict the presence of a do- 
main in a genome based on the domain profile in Pfam without 
any modifications to the genomic sequence, but occasionally it 
introduced one or more frameshifts to make sensible prediction 
alignments. Although GENEWISEDB labeled such predictions as 
pseudogenes, we treated them with extra care because the neces- 
sity of introducing frameshifts may well result from sequencing 
errors in the genome. 

Domains identified in each genome were grouped together 
based on their orientation and the coordinates of their genomic 
locations, and were compared to the 62 NR protein sequences 
using the best BLASTP hit as the identity. The GENEWISEDB 
search results were also parsed to create custom annotation tracks 
in the UCSC genome browser (http://genome.ucsc.edu/) to de- 
pict the exon-intron structure of the predicted domains and to 
enable cross-examination with mRNA/EST evidence, synteny, 
and genomic sequence conservation across species. 

Pseudogenes were identified among NR genes which had 
more than one copy in a genome and when the sequence of the 
mRNA transcript of this gene or its orthologs was available. The 
mRNA sequence was aligned using TBLASTN and BLAT (Kent 
2002) to the genomic sequences at the locations where the dif- 
ferent copies of multiple NR genes were found. A copy of an NR 
gene in the genome was considered to be a pseudogene if frame- 
shifts or nonsense mutations were found in its sequence which 
could not be credibly attributed to the sequencing errors. 

Statistical Test for Clustering of Nuclear Receptor 
Genes in the Rat Genome 

The spatial distribution of the NRs in the rat genome was tested 
for clustering by x 2 (Zar 1984). The rat genome was divided into 
nonoverlapping 2.25-Mb segments, and the number (X) of NRs 
was tallied in each segment. The observed frequency (f a ) of X was 
tallied, and the corresponding expected frequency (f e ) was calcu- 
lated from the Poisson probability P(X). The x 2 value was 
X 2 = 17.702, degrees of freedom 1. Because x 2 o.ooi, i - 10.828, the 
random distribution is rejected (P < 0.001). 

Sequence Analyses 

The peptide sequences of the DBD and the LBD were identified 
and extracted from genomic sequence using GENWISEDB. Se- 
quences in each set were aligned using CLUSTALW, and the mul- 
tiple sequence alignments were then inspected and manually re- 
fined in BioEdit. The C-terminal 5-10 residues were incorrect in 
about half of all sequences extracted from the genomes by 
GENEWISEDB. They were corrected to match the corresponding 
GenBank sequence. Nucleotide sequences of each domain were 
aligned in accordance with their corresponding amino acid se- 
quence alignment. 

Corrected but unaligned LBD peptide sequences were 
searched for conserved sequence motifs (http://blocks.fhcrc. 
org/). BLOCKMAKER returns two sets of motifs generated by 
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complimentary methods of detecting ungapped regions of simi- 
larity (Henikoff et al. 1995). Positive identifications required that 
both methods agreed on the presence and location of the given 
motif within the sequence. Three motifs were obtained for all 
sequences except the steroid receptors and TLX (data not shown). 
Motif I included the "canonical LBD signature" spanning a-he- 
lices 3-5 (see Fig.4B r C; Wang et al. 1989; Wurtz et al. 1996) of 12 
helices in the LBD. Motif II corresponded to a central portion of 
the LBD spanning helices 7-9, involved in dimerization (see Fig. 
4B,C). Conservation in the first half of this domain, up to the 
aspartic acid (Fig. 4C) was observed by Wang et al. (1989), but 
others have noted extended conservation (Laudet et al. 1992). 
Motif III spanned helices 10-11. This region is subject to alter- 
nate splicing in some NR genes, so it was set aside pending com- 
plete description of the family isoforms. 

CLUSTALW correctly aligned the residues corresponding to 
motifs I and III but not motif II. In particular, the subfamily 
NROB alignment was greatly improved using motif II as a guide; 
minor adjustments were required in some other subfamilies. Phy- 
logenetic tree reconstruction of both the protein and DNA align- 
ments was performed using an implementation of the neighbor- 
joining method in the PAUP*4.0 software package (Swofford 
2003) together with a bootstrap of 1000 replicates. 

of every orthologous gene pair was calculated as the 
measure of sequence evolution (Li et al. 1985). Student's f-test 
was used to detect positive Darwinian selection. 

Splice Junction Analysis 

Splice junctions in the coding sequences were located using BLAT 
to match all protein sequences (62 representative members from 
GenBank described above) to the corresponding genome. The 
BLAT exon segments were manually aligned in a manner that 
brought into register the DBD and LBD of each protein from the 
three genomes using EXCEL. This enabled rapid curation of ex- 
ons found by BLAT, which included elimination of false positive 
exons due to such things as single-residue indels, missing small 
N-terminal exons, and other splice site ambiguities that may 
have tricked BLAT. 
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